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Abstract. Neural cryptography is based on synchronization of tree parity 
machines by mutual learning. We extend previous key-exchange protocols by 
replacing random inputs with queries depending on the current state of the neural 
networks. The probability of a successful attack is calculated for different model 
parameters using numerical simulations. The results show that queries restore the 
security against cooperating attackers. The success probability can be reduced 
without increasing the average synchronization time. 
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1. Introduction 

Neural cryptography [1,2] is a method for generating secret information over a public 
channel. Before two partners A and B can exchange a secret message over a public 
communication channel they have to agree on a secret encryption key. Using number 
theoretic methods, such keys can be constructed over public channels without previous 
secret agreements of the two partners [3] . Any details of the algorithm as well as the 
complete information passed between the partners are known to a possible attacker E; 
nevertheless the final key is secret, it is known to the two partners A and B only, and 
an opponent E with limited computer power cannot calculate the key. The method is 
based on the computational difficulty of factorizing large numbers or calculating the 
discrete logarithm of large numbers [3] . 

Recently it has been demonstrated that secret keys can be generated by a 
completely different method [1], as well. This method is based on synchronization 
of neural networks by mutual learning [4,5]. The secret key is generated by the 
dynamics of a complex physical process, namely the competition between stochastic 
attractive and repulsive forces which act on the weights of the two neural networks of 
the partners A and B. Two dynamical systems which synchronize by mutual signals 
have an advantage over an attacker E which can only synchronize by listening to the 
exchanged signals [2] . Finally the key is taken as the synchronized weights of the two 
networks A and B. 

The security of neural cryptography is still being debated [6-11]. Since the 
method is based on a stochastic process, there is a small chance that an attacker 
synchronizes to the key, as well. However, it has been found that the model parameter 
L (the synaptic depth defined below) determines the security of the system: the success 
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probability of the attacker decreases exponentially with L while the synchronization 
time, i.e. the amount of effort for agreeing on a key, increases by L 2 , only. Hence, by 
increasing the value of L the security of neural cryptography can be increased to any 
desired level [9]. 

These scaling laws are not obvious at all. Neural cryptography is based on a 
subtle difference between bi-directional and uni-directional couplings of a stochastic 
process. Our understanding of these processes is still incomplete. Hence it is still 
possible that a clever algorithm may destroy the security of the method. 

In fact, there is a special algorithm of neural cryptography, the Hebbian rule 
defined below, which allows a clear determination of scaling laws with respect to the 
parameter L. Recently it has been shown that a special attack based on the majority 
of an ensemble of attacking networks destroys the security of the method: the success 
probability is constant for large values of L [11]. These results are the motivation for 
the present investigation. We develop a new kind of learning rule which restores the 
security: the success probability decreases exponentially with L. This new rule uses 
learning by queries [12] which is a well-known principle in the theory of learning by 
examples [13]. It is based on exchanging inputs between A and B which are correlated 
to the weight vectors of the two networks. 

It should be mentioned that there are a few other rules (anti-Hebbian or random 
walk) with lower success probabilities for which the scaling properties with respect to 
the parameter L cannot be found, yet. This means that we still do not know whether 
the majority attack destroys the security of those algorithms, as well. Nevertheless it 
is important to develop a rule which restores security for the Hebbian rule where the 
majority attack is clearly successful. 

Our findings allow us to reach the conclusion that for all algorithms suggested so 
far the scaling laws for the success probability hold: the security of neural cryptography 
can be increased to any desired level. 



2. Neural cryptography 

In this section we repeat the definition of neural cryptography. Each of the two 
partners A and B uses a special neural network called tree parity machine (TPM) . As 
shown in figure a TPM consists of K hidden units a k with weight vectors w k and 
input vectors x k . The components of the input vectors are binary and the weights are 
discrete numbers with depths L, 

x k ,j e {-1,+1}, w k ,j e {-L,-L+l,...,L-l,L}, (1) 

where the index j = 1, . . . , N denotes the elements of each vector and k = 1, . . . , K 
the hidden units. The outputs of these neurons are defined by the scalar product of 
inputs and weights, 

o-fc = sgn(u;fe • x k ) . (2) 

The final output bit of each TPM is defined by the product of the hidden units, 

K 

T=n<7 k . (3) 

k=l 

Both partners A and B initialize their weight vectors by means of random numbers 
before the training period starts. At each time step t a public input vector is generated 
and the bits r A and r B are exchanged over the public channel. In the case of identical 
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Figure 1. A tree parity machine with K = 3 and N = 4. 



output bits, t a = t b , each TPM adjusts those of its weights for which the hidden 
unit is identical to the output, er^ B = r A / B . These weights are adjusted according 
to a given learning rule. Here we consider the Hebbian rule 

W t /B (t+l)=wt /B (t) + T^ Xk . (4) 

After some time t sync the two partners are synchronized, w£(t) = wf(t), and the 
communication is stopped. Then the common weight vector is used as a key to encrypt 
secret messages. 

Note that any possible attacker E knows as much about the process as A knows 
about B and vice versa. But E has some disadvantage with respect to A and B: it can 
only listen to the communication and cannot influence the dynamics of the weights in 
A's and B's neural networks [2, 14]. It turns out that this difference determines the 
security of the crypto-system. 

It seems obvious that a successful attacker should also use a TPM with similar 
training rules. In fact, it is advantageous to use many networks for an attack. Since 
the method is stochastic (due to the random input) the attacker may synchronize as 
well, but with a low probability Pe- The security of neural cryptography is related to 
the fact that Pe decreases exponentially with the value of L [9] . 

In all previous learning rules the sequence of input vectors x^(t) was generated 
by a public random number generator. Here we propose to take queries, i.e. input 
vectors which are correlated with the present weight vector Wk (t) . At odd (even) time 
steps the partner A (B) is generating an input vector which has a certain overlap to 
its weights w£ (wf). It turns out that queries improve the security of the system. 



3. Queries 



The process of synchronization itself can be described by standard order parameters 
which are also used for the analysis of on-line learning [13]. These order parameters 
are 

(5) 
(6) 

where the indices m,n E {A, B, E} denote A's, B's, or E's TPM. The distance between 
two corresponding hidden units is defined by the (normalized) overlap 

m.n _ w k ' W k _ U k tn\ 



Qk 


= —wT 

N k 






N k 
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The maximum value pk = 1 is reached for fully synchronized hidden units (zero 
distance), while uncorrelated weight vectors have zero overlap (maximal distance). 

The distance between two corresponding hidden units decreases, if the learning 
rule Q) is applied to both of them using the same input vector Xk- Thus coordinated 
moves of the weights have an attractive effect in the process of synchronization [6, 15]. 

But changing the weights in only one hidden unit increases the average distance 
[6]. These repulsive steps between two corresponding hidden units can only occur 
if their output bits are different [16]. The probability for this event is given by the 
well-known generalization error of the perceptron [13] 

1 

e k = - arccos p k , (8) 

TT 

which depends on the overlap pk between the weight vectors of corresponding hidden 
units. For an attacker using the same learning rule as A and B, repulsive steps in the 
fcth hidden unit occur with probability p r - = ek, as E cannot influence the process of 
synchronization. 

In contrast, the partners update the weights in their TPMs only if t a = t b . In 
the case of identical generalization error, = e, we find for K — 3 that repulsive 
steps in a hidden unit of B's neural network occur with the probability [16] 

2(1 -e)e 2 

*= (l- e )3 + 3(l- e)e ' - C - (9) 

So p r is lower for synchronization than for learning and the partners partially avoid 
repulsive steps. This advantage makes neural cryptography feasible and prevents 
successful attacks, which are only based on simple learning. 

But E can assign a confidence level to each output af of its hidden units. For 
this task the local field 

hk = -4=wt • x k , (10) 
V N 



is used as additional information. Then the prediction error, the probability of different 
output bits for an input vector Xk inducing a local field hk, is given by [17] 



e{pk,h k ) = - 



1 — erf 



Pk \hk 



(11) 



The function e(pk, hk) reaches a maximum of e(pk, 0) = 0.5 for hk = 0. In this case Xk 
is perpendicular to Wk and the neural network has no information about this example. 
But for increasing \hk\ the confidence of prediction rises and e(pk, hk) shrinks. 

This effect is essential for the Geometric Attack [6], which is — up to now — the 
most successful method for a single attacking neural network with a structure identical 
to A's and B's [9]. Here E trains its own TPM, with the examples, input vectors 
and output bits, transmitted by the two partners. If r E = r A , the attacker applies 
the same learning rule as A and B. But for r E =^ r A E knows that at least one of 
the hidden units has made a wrong prediction and searches for the unit k with the 
minimum absolute value \hk\ of the local field. Because this hidden unit has the 
lowest confidence of prediction, its output af is most likely to be different from ct a . 
Therefore the attacker inverts both <r E and r E . Afterwards the usual learning rule 
can be applied. As this geometric attack method reduces the frequency of repulsive 
steps, E increases its success probability Pe by taking the local field into account. 
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Figure 2. Prediction error e(p, h) as a function of the overlap p for different 
values of the local field. The generalization error e is shown as thick line. We 
assumed Q = 1 for this diagram. 



But the partners can influence the local field. Instead of using random inputs, 
they are able to select input vectors with a fixed \hk\ (queries [12]) in their own 
hidden units. This essentially modifies the functional dependency between overlap p k 
and frequency of repulsive steps p r . In this case determines the probability of 
different output bits instead of ©. 

Note that the chosen absolute local field \hk\ for synchronization with queries is 
lower than the average value 

Tk (12) 



(\h k \) = y/2Q k /7r^0.8y/Qk 
observed for random inputs. Therefore the overlap 

w k -x k 1 h k 



Pk, 



(13) 



y/W k ■ W k y/X k ■ X k y/N VQk 

between input vector and weight vector converges to zero in the limit N — > oo, even 
if queries with < \h k \ < oo are used. 

Figure shows the prediction error e(p, h) for queries, which induce the local field 
h. Compared to the case of random inputs, e(p, h) is increased for small overlap and 
decreased for nearly synchronized neural networks. By selecting queries with different 
local fields, A and B can regulate this effect. 

As learning is slower than synchronization, p AE is typically smaller than p AB . In 
this situation queries with small values of the parameter h increase the frequency of 
repulsive steps for the attacker without affecting the process of synchronization too 
much. 



4. Synchronization 

In order to integrate queries into the neural key-exchange protocol [1], only the 
generation of the inputs has to be changed. Instead of choosing completely random 
Xk,j, both partners alternately ask each other questions. In each time step either A or 
B uses the algorithm described in | Appendix A| to generate K input vectors x k , which 



result in h 



A/B 



±H. Then these queries are sent to the other partner and both 



calculate the output of their TPMs. After the exchange of r A and r B , the Hebbian 
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Figure 3. Synchronization time of two TPMs with K = 3 and N = 1000, 
averaged over 10 000 simulations. 



learning rule is used to update the weights as described in section [3 This leads to 
synchronization after t sync steps. 

As shown in figure|21 (i S ync) diverges for H — ► 0. In this limit the prediction error 
e(pk,H) reaches 0.5 independent of p k - Therefore the effect of the repulsive steps 
inhibits synchronization. But the choice of H does not influence (t sync ) much, as long 
as this quantity is large enough. 

The dependence on L of (t sync ) is caused by two effects: 

• The dynamics of each weight can be described as random walk with reflecting 
boundaries if the control signals a k and r are neglected. Calculations for this 
simplified model of neural synchronization show that (t sync ) scales proportional 
to L 2 [15]. This behavior has been observed in neural cryptography with random 
inputs [9], too. 

• If queries are used, the probability of repulsive steps p r depends not only on the 
overlap pk, but also on the quantity \hk\/y/Qk- Assuming uniformly distributed 
weights, we find 

(Qk) = (^w k -w k )=lL(L+l)~h 2 (14) 



t N 7 3 v '3 

for the expectation value of the order parameter Q k ■ Hence we have to increase 
H proportional to the synaptic depth L, if we want to observe the same frequency 
of repulsive steps. 

Using both (t sync ) oc L 2 and H oc L we can rescale (t sync )H.L in order to obtain 
functions /^(a), which are nearly independent of the synaptic depth in the case L> 1: 

(Vnc>=£ 2 /i(f) . (15) 

In figure 01 we have plotted these functions for different values of L. It is clearly 
visible that /i(a) converges to a universal scaling function /(a) in the limit L — > oo: 

f(a) = lim f L (a) . (16) 

L — >oo 

Additionally, we find that the distance |/l(o0 — fi a )\ shrinks proportional to L^ 1 . 
Therefore we can use finite size scaling to determine f(a), which is shown in figure 
too. 
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Figure 4. Scaling behavior of the synchronization time. The thick curve denotes 
the universal function /(») defined in 1161 . It has been obtained by finite size 
scaling, which is shown in the inset for a = 0.5 (o) and a = 0.6 (□). 
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Figure 5. Extrapolation of 1 to L — > oo. Symbols denote the values extracted 
from figure |41 for the average synchronization times 100L 2 (o), 150L 2 (□) and 
200L 2 (0). 



This function diverges for a < a c . We estimate a c sa 0.36 for K — 3 and N — 1000 
by extrapolating the inverse function f^ 1 as shown in figure [3] Consequently, 
synchronization is only achievable for H > a c L in the limit L —> oo. 

5. Security 

Up to now, the most successful attack on neural cryptography is the Majority Flipping 
Attack [11], which is an extension of the Geometric Attack. Instead of a single neural 
network, E uses an ensemble of M TPMs. At the beginning, the weight vectors of all 
attacking networks are chosen randomly, so that the average initial overlap between 
them is zero. Like A and B, the attacker only updates the weights in time steps 
with If the output r E ' m of the mth attacking network disagrees with r , 

the hidden unit with the smallest absolute value |/i^' m | of the local field is selected. 
Then the output bits o^' m and r E,m are inverted. Afterwards E counts the internal 
representations (af ,m , . . . , c E r ' m ) and selects the most common one. This majority 
vote is then adopted by all attacking networks for the application of the learning rule. 
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Figure 6. Success probability of the Majority Flipping Attack as a function of 
H . Symbols denote the results obtained in 10 000 simulations for K = 3, M = 100 
and N = 1000. The inset shows the success of a Geometric Attack for K = 3, 
L = 5 and N = 1000. 



Because of these identical updates, E's neural networks become correlated [11], 
which reduces the efficiency of the Majority Flipping Attack. In order to slow down 
this effect, we use the modifications proposed in [11]: 

• The majority vote is only considered in even time steps. Otherwise E updates the 
weights according to the internal representation of the particular neural network. 

• At the beginning of the synchronization, only the Geometric Attack is applied by 
the attacker. But instead of waiting until t > | t sync as suggested in [11], E starts 
with the Majority Flipping Attack after 100 steps. 

For the neural key-exchange protocol with random inputs and Hebbian learning 
rule, the success probability Pe of this method reaches a constant non- vanishing value 
in the limit L — > oo [11]. Therefore it breaks the security of this type of neural 
cryptography. 

In contrast, if the two partners A and B use queries for the neural key exchange, 
the success probability strongly depends on the parameter H. This can be used to 
regain security against the Majority Flipping Attack. As shown in figure a Fermi- 
Dirac distribution 

1 

Pe ~ l + exp(-/?Cff-/*)) 
with two parameters (3 and (J, is a suitable fitting function for describing Pe as a 
function of H. Equation (|17(l is also valid for the Geometric Attack, which is a special 
case of the Majority Flipping Attack (M — 1). 

Figure [7] shows the dependence on L of the fit parameter fi for both attacks. 
Aside from finite size effects for small values of L, this parameter is proportional to 
the synaptic depth of the TPMs: 

t i = a s L. (18) 

Obviously, the quantity a — H/L not only determines the synchronization time but 
also the success of an attack. These effects are caused by the modification of p r due 
to the use of queries. The other parameter (3 is nearly constant for L > 3. So both 
a s and (3 are independent of the chosen parameters H and L, but depends on the 
attacker's method. 



Neural cryptography with queries 9 




Figure 7. Parameters /i and /3 as a function of the synaptic depth L. Symbols 
denote the results of fits using 1171 for the Geometric Attack (o) and the Majority 
Flipping Attack with M = 100 (□). From these values we obtain ct^flip sa 0.45 
( ) and a s ,M=ioo ~ 0.38 ( ) according to 1181 . 



From these results we can also deduce the scaling behavior of Pe as a function of 
the synaptic depth. For both the Majority Flipping Attack and the Geometric Attack 
we obtain 

PE = 1 + exp(/3(a s - a)L) (19) 

for synchronization with queries and H = ah. In the limit L — > oo, the asymptotic 
behavior of Pe is given by 

P E ~ e -P(.«.-a)L (20) 

as long as a < a s . Here the success probability Pe decreases exponentially with 
increasing synaptic depth, 

Pe « e- yL , (21) 

which is also observed in the case of random inputs, if E uses the Geometric Attack [9]. 

Consequently, the neural key exchange with queries is secure against both 
attacks. An attacker using the Majority Flipping Attack only decreases the value 
of y — [3(a s — a), because a Sima jority < c*s,flip> but does not change the exponential 
scaling law 

For practical aspects of security, however, one has to look at the function 
Pe(vWuc))) which is shown in figure [5] Here we find that queries enhance the security 
of the neural key-exchange protocol a lot for given synchronization time. There is an 
optimal value of H associated with each L, which lies on the envelope of all functions 

ft((Wnc». 

Figure[S]also shows that A and B can achieve higher security against attacks than 
predicted by (|19|l with a = a c as long as L is not to large. This phenomenon is caused 
by finite size effects which enable synchronization even for H < a c L. But, of course, 
this does not work for L>1. Therefore the envelope and pE^sync)) f° r H = a c L 
converge asymptotically in the limit L — > oo. 

Queries reveal additional information about the weight vectors of A's and B's 
neural networks. However, an attacker E cannot benefit from this information, since 
for a given value of H, there is still an exponential large number of weight vectors Wk, 
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Figure 8. Success probability as a function of the average synchronization time 
for K = 3, N = 1000 and different values of L and H . Part (a) shows the result 
for the Geometric Attack and part (b) for the Majority Flipping Attack with 
M = 100 attacking neural networks. The solid curve in each graph represents the 
success probability for neural cryptography with random inputs and the dashed 
line marks H = ct c L. 



which are consistent with a given query. As an example, there are 2.8 x 10 129 possible 
weight vectors for L = 10, N = 100, and h k = 10. Because of this large number, E 
cannot gain useful information from queries. 

6. Conclusions 

Neural cryptography is a subtle competition between interaction and learning of neural 
networks. Two neural networks A and B exchange some information about their states 
over a public channel. The amount of information has to be so high that the two 
networks A and B can synchronize. But it has to be so low that an attacker E can 
only synchronize with a low probability which can be decreased to an arbritrary low 
value. 

We have shown that increasing the amount of exchanged information can be of 
advantage for cryptography. We have included queries in the training process of the 
neural networks. This means that alternately A and B are generating an input which 
is correlated with its state and A or B is asking the partner for the corresponding 
output bit. The overlap between input and weight vector is so low that the additional 
information does not reveal much about the internal states. But queries introduce a 
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mutual influence between A and B which is not available to an attacking network E. 
In addition the method obtains a new (public) parameter which can be adjusted to 
give optimal security. 

We have applied this new method to the case of the Hebbian training rule which 
was successfully attacked using the majority of an ensemble of attackers. We find 
that queries restore the security of the method: the probability of a successful naive 
majority attack can be decreased to any desired level. 



Appendix A. Generation of queries 



In this appendix we describe the algorithm used to generate a query ccfc, which induces 
a previously chosen local field hk- The solution, of course, depends on the weight vector 
w k of the hidden unit. This task is similar to solving a knapsack problem [18], which 
can be very difficult. But we need only a fast approximate solution in order to use 
queries in the neural key-exchange protocol. 

As both weights Wk,j and inputs x k j are discrete, there are only 2L+1 possibilities 
for Wk,j ■ Xk,j- Therefore we can describe the solution by counting the number c k j of 
products with Wkj ■ Xk,j = I- Then the local field is given by: 



1 



L 

V ^ 

1=1 



l(ck,i - c k -i) 



(A.l) 



We also note that the sum nk,i — Ck.i + Cfc.-i is equal to the number of weights with 
\ w k,j \ = |'| • Hence the values of nk,i depend only on the weight vector w k . This can 
be used to write hk as a function of only L variables, because the generation of queries 
cannot change w k : 



h k = 



1 



L 

v ^ 

(=1 



l(2c k ,i - n k j) . 



(A.2) 



In our simulations we use the following algorithm to generate the queries. First 
the output a k of the hidden unit is chosen randomly. Therefore the set value of the 
local field is given by h k = <J k H. Then we use either 



or 



Cfc ; 



Ck,l 



1 



n k ,i - 1 



1 

2Z 



<T k HVN 



L 

E 

3=1 + 1 



3=1 + 1 



j'(2c fej - rtfcj) 



(A.3) 



(A.4) 



to compute the values of c ki L, c ki L-i, ■ ■ ■ , c-h,x- I n each calculation one of the two 
equations is selected randomly with equal probability, so that rounding errors do 
not influence the average result. Additionally, we have to take into account that 
< Ck,i < Hfc,/. Therefore we set Cf-j to the nearest boundary value, if (|A.3(I or (|A.4J| 
yield a result outside this range. 

Afterwards the input vector x k is generated. Inputs associated with zero weights 
are chosen randomly, because they do not influence the local field. The other input 
bits divided into L groups according to the absolute value I = \w k j \ of their 
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corresponding weight. In each group, inputs are selected randomly and set to 
x k.j = sga(wkj)- The remaining n^,i — Cjy input bits are set to x^j = — sgn(wfcj). 

Simulations show that the absolute local field \hk\ matches its set value H on 
average. And we observe only small deviations, which are caused by the restriction 
of the input values to +1 or —1. So we can generate queries, which approximately 
induce a predetermined absolute local field H by using this algorithm. 
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